Goto

Collaborating Authors

 target variable






e8a642ed6a9ad20fb159472950db3d65-Supplemental.pdf

Neural Information Processing Systems

Methods for handling missing data has been extensively studied in the past few decades. Those methods can be roughly classified into two categories: complete case analysis (CCA) based, and imputationbasedmethods. CCAbasedmethods,suchaslistwisedeletion[1]andpairwisedeletion [31] focuses on deleting data instances that contains missing entries, and keeping those that are complete. Standardtechniquesof single imputation include mean/zero imputation, regression-based imputation [1], no-parametric methods [15,54]. For the factorized priorp(Z|U) of the i-VAE component of GINA, we used 15 a linear network with one auxiliary input (which is set to be fully observed dimension,X1).






Targeted Sequential Indirect Experiment Design

Neural Information Processing Systems

Scientific hypotheses typically concern specific aspects of complex, imperfectly understood or entirely unknown mechanisms, such as the effect of gene expression levels on phenotypes or how microbial communities influence environmental health. Such queries are inherently causal (rather than purely associational), but in many settings, experiments can not be conducted directly on the target variables of interest, but are indirect. Therefore, they perturb the target variable, but do not remove potential confounding factors. If, additionally, the resulting experimental measurements are high-dimensional and the studied mechanisms nonlinear, the query of interest is generally not identified. We develop an adaptive strategy to design indirect experiments that optimally inform a targeted query about the ground truth mechanism in terms of sequentially narrowing the gap between an upper and lower bound on the query. While the general formulation consists of a bi-level optimization procedure, we derive an efficiently estimable analytical kernel-based estimator of the bounds for the causal effect, a query of key interest, and demonstrate the efficacy of our approach in confounded, multivariate, nonlinear synthetic settings.